Dually Interactive Matching Network for Personalized Response Selection in Retrieval-Based Chatbots

2019-09-25

本文研究个性化检索式对话系统，传统的个性化方法利用persona编码增强context表示，然后再与response匹配，而本文提出了DIM模型，核心是context与response，persona与response进行双匹配。

paper: https://drive.google.com/open?id=1WIBSG2pRGhpVmkV4OGNx0X2xQzSu-v2C
code: https://github.com/JasonForJoy/DIM
source: EMNLP 2019

Introduction

本文研究的是个性化对话系统，如下图所示：

Personalizing Dialogue Agents I have a dog, do you have pets too? 提出了一种个性化模型，首先基于context表征来对persona中每一句做注意力得到persona表征，然后将个性编码与context拼接，最终与候选回复计算相似度。这种方法存在两个问题：（1）context被当做一个整体处理，忽略了context不同utterance的区别；（2）获取个性表征时用户个性与回复之间的交互没有被考虑。

本文提出了一种基于 interactive matching network(IMN) 的检索式个性化对话系统，考虑细粒度的context与persona交互；同时还提出了 dually interactive matching network (DIM) 模型，使得context与response，persona与response进行双匹配。

IMN-Based Persona Fusion

以$\{\mathbf{u}_{m}\}_{m=1}^{n_{c}}$代表context utterances，$\mathbf{c}$代表concatenation of context utterances，$\{\mathbf{p}_{n}\}_{n=1}^{n_{p}}$ 代表profile sentences。Fig (a) 以下列方式计算persona：
$$
\mathbf{c}^{+}=\mathbf{c}+\sum_{n} \operatorname{Softmax}\left(\mathbf{c} \cdot \mathbf{p}_{n}\right) \mathbf{p}_{n}
$$

Fig (b)考虑context中的每一句utterance：
$$
\mathbf{u}_{m}^{+}=\mathbf{u}_{m}+\sum_{n} \operatorname{Softmax}\left(\mathbf{u}_{m} \cdot \mathbf{p}_{n}\right) \mathbf{p}_{n}
$$

再使用聚合层得到增强的context表征（可以是RNN或者注意力机制）：
$$
\mathbf{c}^{+}=\text {Aggregation }\left(\left\{\mathbf{u}_{m}^{+}\right\}_{m=1}^{n_{c}}\right)
$$

Dually Interactive Matching Network

Sentence Encoding Layer

Matching Layer 匹配层由cross-attention实现，具体实现见原文。
Aggregation Layer 聚合层的目的是将匹配层的输出映射为一个特征向量，具体实现如下。首先将匹配层的结果分别通过一个共享的BiLSTM（注意persona和context都是多句话，因此对应多个向量序列）：